90 research outputs found

    Means and covariance functions for geostatistical compositional data: an axiomatic approach

    Full text link
    This work focuses on the characterization of the central tendency of a sample of compositional data. It provides new results about theoretical properties of means and covariance functions for compositional data, with an axiomatic perspective. Original results that shed new light on the geostatistical modeling of compositional data are presented. As a first result, it is shown that the weighted arithmetic mean is the only central tendency characteristic satisfying a small set of axioms, namely continuity, reflexivity and marginal stability. Moreover, this set of axioms also implies that the weights must be identical for all parts of the composition. This result has deep consequences on the spatial multivariate covariance modeling of compositional data. In a geostatistical setting, it is shown as a second result that the proportional model of covariance functions (i.e., the product of a covariance matrix and a single correlation function) is the only model that provides identical kriging weights for all components of the compositional data. As a consequence of these two results, the proportional model of covariance function is the only covariance model compatible with reflexivity and marginal stability

    Estimating the evidence of selection and the reliability of inference in unigenic evolution

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Unigenic evolution is a large-scale mutagenesis experiment used to identify residues that are potentially important for protein function. Both currently-used methods for the analysis of unigenic evolution data analyze 'windows' of contiguous sites, a strategy that increases statistical power but incorrectly assumes that functionally-critical sites are contiguous. In addition, both methods require the questionable assumption of asymptotically-large sample size due to the presumption of approximate normality.</p> <p>Results</p> <p>We develop a novel approach, termed the Evidence of Selection (EoS), removing the assumption that functionally important sites are adjacent in sequence and and explicitly modelling the effects of limited sample-size. Precise statistical derivations show that the EoS score can be easily interpreted as an expected log-odds-ratio between two competing hypotheses, namely, the hypothetical presence or absence of functional selection for a given site. Using the EoS score, we then develop selection criteria by which functionally-important yet non-adjacent sites can be identified. An approximate power analysis is also developed to estimate the reliability of inference given the data. We validate and demonstrate the the practical utility of our method by analysis of the homing endonuclease <monospace>I-Bmol</monospace>, comparing our predictions with the results of existing methods.</p> <p>Conclusions</p> <p>Our method is able to assess both the evidence of selection at individual amino acid sites and estimate the reliability of those inferences. Experimental validation with <monospace>I-Bmol</monospace> proves its utility to identify functionally-important residues of poorly characterized proteins, demonstrating increased sensitivity over previous methods without loss of specificity. With the ability to guide the selection of precise experimental mutagenesis conditions, our method helps make unigenic analysis a more broadly applicable technique with which to probe protein function.</p> <p>Availability</p> <p>Software to compute, plot, and summarize EoS data is available as an open-source package called 'unigenic' for the 'R' programming language at <url>http://www.fernandes.org/txp/article/13/an-analytical-framework-for-unigenic-evolution</url>.</p

    Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing

    Get PDF
    Abstract Background Inherent sources of error and bias that affect the quality of sequence data include index hopping and bias towards the reference allele. The impact of these artefacts is likely greater for low-coverage data than for high-coverage data because low-coverage data has scant information and many standard tools for processing sequence data were designed for high-coverage data. With the proliferation of cost-effective low-coverage sequencing, there is a need to understand the impact of these errors and bias on resulting genotype calls from low-coverage sequencing. Results We used a dataset of 26 pigs sequenced both at 2× with multiplexing and at 30× without multiplexing to show that index hopping and bias towards the reference allele due to alignment had little impact on genotype calls. However, pruning of alternative haplotypes supported by a number of reads below a predefined threshold, which is a default and desired step of some variant callers for removing potential sequencing errors in high-coverage data, introduced an unexpected bias towards the reference allele when applied to low-coverage sequence data. This bias reduced best-guess genotype concordance of low-coverage sequence data by 19.0 absolute percentage points. Conclusions We propose a simple pipeline to correct the preferential bias towards the reference allele that can occur during variant discovery and we recommend that users of low-coverage sequence data be wary of unexpected biases that may be produced by bioinformatic tools that were designed for high-coverage sequence data

    water chemistry are new challenges possible from coda compositional data analysis point of view

    Get PDF
    John Aitchison died in December 2016 leaving behind an important inheritance: to continue to explore the fascinating world of compositional data. However, notwithstanding the progress that we have made in this field of investigation and the diffusion of the CoDA theory in different researches, a lot of work has still to be done, particularly in geochemistry. In fact most of the papers published in international journals that manage compositional data ignore their nature and their consequent peculiar statistical properties. On the other hand, when CoDA principles are applied, several efforts are often made to continue to consider the log-ratio transformed variables, for example the centered log-ratio ones, as the original ones, demonstrating a sort of resistance to thinking in relative terms. This appears to be a very strange behavior since geochemists are used to ratios and their analysis is the base of the experimental calibration when standards are evolved to set the instruments. In this chapter some challenges are presented by exploring water chemistry data with the aim to invite people to capture the essence of thinking in a relative and multivariate way since this is the path to obtain a description of natural processes as complete as possible

    Source patterns of potentially toxic elements (PTEs) and mining activity contamination level in soils of Taltal city (northern Chile)

    Get PDF
    Mining activities are among the main sources of potentially toxic elements (PTEs) in the environment which constitute a real concern worldwide, especially in developing countries. These activities have been carried out for more than a century in Chile, South America, where, as evidence of incorrect waste disposal practices, several abandoned mining waste deposits were left behind. This study aimed to understand multi-elements geochemistry, source patterns and mobility of PTEs in soils of the Taltal urban area (northern Chile). Topsoil samples (n = 125) were collected in the urban area of Taltal city (6 km2) where physicochemical properties (redox potential, electric conductivity and pH) as well as chemical concentrations for 35 elements were determined by inductively coupled plasma optical emission spectrometer. Data were treated following a robust workflow, which included factor analysis (based on ilr-transformed data), a new robust compositional contamination index (RCCI), and fractal/multi-fractal interpolation in GIS environment. This approach allowed to generate significant elemental associations, identifying pool of elements related either to the geological background, pedogenic processes accompanying soil formation or to anthropogenic activities. In particular, the study eventually focused on a pool of 6 PTEs (As, Cd, Cr, Cu, Pb, and Zn), their spatial distribution in the Taltal city, and the potential sources and mechanisms controlling their concentrations. Results showed generally low baseline values of PTEs in most sites of the surveyed area. On a smaller number of sites, however, higher values concentrations of As, Cd, Cu, Zn and Pb were found. These corresponded to very high RCCI contamination level and were correlated to potential anthropogenic sources, such as the abandoned mining waste deposits in the north-eastern part of the Taltal city. This study highlighted new and significant insight on the contamination levels of Taltal city, and its links with anthropogenic activities. Further research is considered to be crucial to extend this assessment to the entire region. This would provide a comprehensive overview and vital information for the development of intervention limits and guide environmental legislation for these pollutants in Chilean soils
    corecore